Intro

In our last episode, we looked at Seoul Bike Stations across the city of Seoul.

In this post, we will explore the trips taken in April of 2021.

On top of the GPS/location feature from station dataset, the trips dataset has a time feature (time of rent, time of return) which poses exciting questions like

How does trips differ by counties, hours of the day, or days of the week?

Do counties or stations have distinct characteristics with which we can cluster them into a few groups?

Data

In this post, we will explore the trips taken in April of 2021.

Let's join Rent station first

Now let's join Return station

Which Day of Week and Hour had the most frequent trips?

Let's start with the simplest question. We will look at trips taken by Day of Week, Hour, County respectively

Trips by Day of Week

There are 27.1% more trips taken during weekends than weekdays. During weekdays, people rode more often on Tuesday and Wednesday. During weekends, people came out to ride bikes more on Saturday than Sunday.

There are 27.1% more trips during weekends than weekdays.

Trips by Hour

We can presume that the trip pattern might look different between weekdays and weekends. Let's compare them and see if the difference exists.

On Weekdays

On Weekends

We can observe the following:

1.3. Does Hourly Trip Pattern Look Different by County?

Rent

Return

All counties have a very simliar hourly trip patterns.

The dark colors in commute time sticks out which makes me wonder:

which sums up to the following question:

1.4. Are the counties that have higher Rent or Return during commute time?

Let's extract the trips taken during communte time (8am and 18pm). In addition, let's look at the Rent to Return Raio.

However, the two plots above have different order of y-axis, so it's quiet difficult to compare the difference in morning and evening commute time.

Let's visualize just the Rents this time because Returns = Trips - Rents.

The morning commute bars graudlly decrease, while the evening commute bars gradually increase.

In other words, counties with higher Rent rate in the morning have higher Return rate in the evening (The correlation is 0.8).

We can imagine that people ride Seoul bikes from home -> work -> home on weekdays

Well, this may be pretty obvious, but we can verify our groundless inference with this visualization now!

2. Which County has High Trip Distance and Trip Hour?

We can assume that a county with high average trip distance and trip hour has more riders who travel a far distance or ride longer.

Let's first look at the trip distance

Trip Distance

Let's draw the distribution in boxplot and distplot.

The distribution is skewed due to outliers. Let's remove the outliers for the sake of better analysis.

The distribution is still skewed a little bit, but the median is 1798 meters.

Let's dig a little deeper by looking at distributions by county.

2.2 Trip Hour

Distribution of All Trips

Distribution without Outliers

Distributions for distance and hour look very simliar. Mode is 5 minutes, and median is 16 minutes.

Trip hour is very simliar to Trip distance.

It's interesting that some counties (Yongsan, Gangnam) have more outlier than others (Gangseo).

Putting both distance and trip hour into maps:

Which County Has High Outflow/Inflow of riders?

Outflow/Inflow represent trips of riders that traveled across from A to B county. We can get the traffic traveling across counties this way.

Since the scale of the nunmber of trips differs by counties, we will look at a relative ratio between counties instead of absolute value.

In short, we want to know how much % of Rents(Returns) at County A are Returned(Rented) at another county.

Mapping Outflow/Inflow Ratio of Counties

We can first check which counties have high outflow ratio or inflow ratio.

Add Time

Let's throw time into the analysis and dig in for more insights

The outflow/inflow ratio have a little bit different patterns! This finding triggers a question:

Since we have to look at outflow/inflow ratio at the same time period, let's take Measure = Inflow Ratio - Outflow Ratio If Measure > 0, more riders are coming to the county in that time period. If Measure < 0, more riders are going out of the county in that time period.

Let's cluster counties with similar patterns in Measure by time period.

Do you see the 3 clusters!? I've named the clusters as A, B, C.

Now, let's plot the Measure by time.

Red = Outflow > Inflow Green = Outflow < Inflow

The clusters make sense!

Cluster A represents counties with high inflow in morning commute time and high outflow in evening commute time. These counties tend to have relatively more companies than other counties.

Cluster B is completely opposite of Cluster A. These counties are well known for residence.

Cluster C does not belong to either Cluster A or B. Patterns are not as clear as other clusters.

Some abnormalies(?) are that:

One thing that crosses my mind is that these abnomalies may be due to distribution of bikes to counties in other stations by trucks.

In which County do Seoul-ers Commute?

Let's visualize the heatmap